HW #2 – with ggplot, dplyr & troubleshooting
– please note that this is written as a notebook where I have tried differnt code and used the results to explore answers and then build upon each try.
– Also, please note that lines are not wrapping properly nor are paragrphs being created as they are coded in the rMarkdown code written here - something about the way the HTML code is being created through the “knitting’ process”
Resetting R’s memory
rm(list = ls())
Load libraries for the homework
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(gapminder)
library(ggplot2)
library(tidyr)
library(formatR)

Plot 1 - create graph of the gdpPercap (x) compared to life expectancy (Y) of each continent

Convert df to tbl and group by vaules - because this is what we did in class
lifeExp_gdp_df <-
  gapminder %>%
  group_by(continent, lifeExp, gdpPercap)
Then plot – like we did in class
ggplot(data = lifeExp_gdp_df, aes(x = gdpPercap, y = lifeExp)) + 
  geom_point(aes(shape = continent)) + 
  geom_point(aes(color = continent)) 

Plot 2 - Use “transformation” to linerise the data – lm? ug… ahh a hint on the hw page – so a “log scale”

yep… there is one in ggplot… so not the “lm” function I was trying to figure out_
ggplot(data = lifeExp_gdp_df, aes(x = gdpPercap, y = lifeExp)) + 
  geom_point(aes(shape = continent)) + 
  geom_point(aes(color = continent)) + scale_x_log10()

Plot 3 - Include a simple “linear fit” to the transformed area

googled “linear fit” to get “regression line” and then got – geom_smooth(method=lm)
ggplot(data = lifeExp_gdp_df, aes(x = gdpPercap, y = lifeExp)) + 
  geom_point(aes(shape = continent)) + 
  geom_point(aes(color = continent)) + scale_x_log10() + 
  geom_smooth(method=lm)

Plot 4 - plot the density functions of the life expectency for each continent

code below will caouse R to hang

ggplot(data = gapminder, aes(x = continent, y = , color = continent )) + geom_area(stat=“bin”) + facet_grid(lifeExp ~ continent)

Eventually leading to this error message mess… What in the code above brought on pages and pages of the text below? Posting just a part of it… hmmm…. unexpected variables… Further more… why isn’t this text and the code above wrapping to new lines as I am telling it to do??

…can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat=“bin” and don’t map a variable to y. If you want y to represent values in the data, use stat=“identity”. See ?geom_bar for examples. (Defunct; last used in version 0.9.2) Error : Mapping a variable to y and also using stat=“bin”. With stat=“bin”, it will attempt to set the y value to the count of cases in each group. This can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat=“bin” and don’t map a #variable to y. If you want y to represent values in the data, use stat=“identity”. See ?geom_bar for examples. (Defunct; last used in version 0.9.2) Error : Mapping a variable to y and also using stat=“bin”. With stat=“bin”, it will attempt to set the y value to the count of cases #in each group. This can result in unexpected behavior and will not be allowed in a future #version of ggplot2. If you want y to represent counts of cases, use stat=“bin” and don’t map a variable to y. If you want y to represent values in the data, use stat=“identity”. See ?geom_bar for examples. (Defunct; last used in version 0.9.2)

ok - so no “y”?? … nope y=lifeExp

well - something popped up but then disappeared.. ??
ggplot(data = gapminder, aes(x = continent, y = lifeExp, color = continent )) +  
geom_boxplot() 

Somethings very not right… no plots being displayed anymore…

reset Rstudio… trying something new
gdp_df <-
  gapminder %>%
  group_by(continent, lifeExp)

ggplot(data =gdp_df, aes(x = gdpPercap, y = lifeExp)) + 
  geom_point(aes(shape = continent)) + 
  geom_point(aes(color = continent)) 

ok… this looks like it works - refining
ggplot(data = gdp_df, aes(x = continent, y = lifeExp, color = continent)) + 
  geom_boxplot() +
  facet_grid(year~continent)

wrong plot type but the grid looks promising
ggplot(data = gdp_df, aes(x = continent, y = lifeExp, color = continent)) + 
  geom_boxplot() +
  facet_wrap(year~continent)

nope…
ggplot(data = gdp_df, aes(x = continent, y = lifeExp, color = continent)) + 
  geom_area() +  facet_wrap(~year)

nope…
ggplot(data = gdp_df, aes(x = lifeExp, y = , color = continent, fill = continent)) + 
  geom_density(color = 'black', alpha = 0.5) +  facet_wrap(~year)

got it with Nelson’s help missing the ‘fill’ part as well as the density params

Plot 5 - fix the wording… create the plot first of course…. well - it’s a boxplot based on the above trials…

ggplot(data = gdp_df, aes(x = continent, y = lifeExp, color = continent)) + 
  geom_boxplot() +  facet_wrap(~year)

(ummm… just widen the screen… mine looks just fine at full screen..??)
ggplot cheat sheet says something about using scale function when looking at the “Labels” tab… don’t see how to do this..)
ggplot(data = gdp_df, aes(x = continent, y = lifeExp, color = continent)) + 
  geom_boxplot() + 
  facet_wrap(~year) + 
  theme_minimal()

like the minimal theme - but it didn’t fix the text… looks like it could be “scale_fill_discrete”
ggplot(data = gdp_df, aes(x = continent, y = lifeExp, color = continent)) + 
  geom_boxplot() + 
  facet_wrap(~year) + 
  theme_minimal() + 
  scale_fill_discrete()

giving up… googling it… yep - This fixes it
ggplot(data = gdp_df, aes(x = continent, y = lifeExp, color = continent)) + 
  geom_boxplot() +  
  facet_wrap(~year) + 
  theme(axis.text.x=element_text(angle=50, size=10, vjust=0.5)) + 
  scale_fill_discrete()

Plot 6 - Use dplyr to calc summary stats (copying text from dplr class and working from that)
(make sure “dplyr” reloads when restarting RStudio)
select(gapminder)
## data frame with 0 columns and 1704 rows
tbl_df(gapminder)
## Source: local data frame [1,704 x 6]
## 
##        country continent  year lifeExp      pop gdpPercap
##         (fctr)    (fctr) (dbl)   (dbl)    (dbl)     (dbl)
## 1  Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2  Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3  Afghanistan      Asia  1962  31.997 10267083  853.1007
## 4  Afghanistan      Asia  1967  34.020 11537966  836.1971
## 5  Afghanistan      Asia  1972  36.088 13079460  739.9811
## 6  Afghanistan      Asia  1977  38.438 14880372  786.1134
## 7  Afghanistan      Asia  1982  39.854 12881816  978.0114
## 8  Afghanistan      Asia  1987  40.822 13867957  852.3959
## 9  Afghanistan      Asia  1992  41.674 16317921  649.3414
## 10 Afghanistan      Asia  1997  41.763 22227415  635.3414
## ..         ...       ...   ...     ...      ...       ...
group_by(gapminder)
## Source: local data frame [1,704 x 6]
## 
##        country continent  year lifeExp      pop gdpPercap
##         (fctr)    (fctr) (dbl)   (dbl)    (dbl)     (dbl)
## 1  Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2  Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3  Afghanistan      Asia  1962  31.997 10267083  853.1007
## 4  Afghanistan      Asia  1967  34.020 11537966  836.1971
## 5  Afghanistan      Asia  1972  36.088 13079460  739.9811
## 6  Afghanistan      Asia  1977  38.438 14880372  786.1134
## 7  Afghanistan      Asia  1982  39.854 12881816  978.0114
## 8  Afghanistan      Asia  1987  40.822 13867957  852.3959
## 9  Afghanistan      Asia  1992  41.674 16317921  649.3414
## 10 Afghanistan      Asia  1997  41.763 22227415  635.3414
## ..         ...       ...   ...     ...      ...       ...
ggplot(data = gapminder, aes(x = lifeExp, y = , fill = continent)) +
  geom_density(color = "black", alpha = 0.5)

Plot 7 - Life expectancy in Asia

a_data <- filter(gapminder, continent == "Asia")
ggplot(data = a_data, aes(x = lifeExp, y = , color = continent, 
                              fill = continent)) +
  scale_fill_manual(values = c("green")) + 
  geom_density(color = "black", alpha = 0.5) + 
  geom_vline(aes(xintercept = mean(lifeExp))) +
  theme_minimal() + 
  ggtitle("Life expectancy in Asia")

Dplyr Plot 8a - Create a dataframe of the mean life expectancies for each continent

mean_lifeExp <- 
  gapminder %>%
  group_by(continent) %>%
  summarize(avg = mean(lifeExp))
Summarize takes big thing and makes it smaller

Dplyr Plot 8b - Plot the densities of the life expectancies for each continent and draw a line to mark the mean for each continent

ggplot(data = gapminder, aes(x = lifeExp, y = , colour = continent, 
                             fill = continent)) +
  facet_wrap(~continent) + 
  geom_density(color = "black", alpha = 0.6) + 
  geom_vline(data = mean_lifeExp, aes(xintercept = avg))

Troubleshooting – Finding mistakes in listed code
Example code is displayed only - and not run, as…, um…, duh… it generates an error
library(ggplot2)
hw_gapminder <- read.csv('/Users/lyman/Dropbox (Byrnes Lab)/R_projects/Intro to R_Fall 2015/hw_gapminder.csv')
1)

mean-lifeExp <- mean(hw_gapminder$lifeExpe)

Ha! The first example - the extra ‘e’ at the end of ‘lifeExpe’ was autofixed when I typed it in. Nice!
mean_lifeExp <- mean(hw_gapminder$lifeExp)
2)

small-set <- hw_gapminder[c(1, 2, 3, 4, 1300:1304), (‘country’, ‘continent’, ‘year’)]

Ha again! auto formatting has already picked out the lonely commas showing where the error is… which looks like a ‘cat’ problem
small_set <- hw_gapminder[c(1, 2, 3, 4, 1300:1304), c('country', 'continent', 'year')]
Yup - concatenate issue
3)

mean-gdp <- mean(hw_gapminder$gdpPercap, na.rm = TRUE)

Running this returns a “NA_real_ value error. From class - you have check data and in this case NA is not a number you can get the mean of — thanks also to Marc for pointing this out AND giving me the ‘na.rm = TRUE’ answer
mean_gdp <- mean(hw_gapminder$gdpPercap, na.rm = TRUE)
4)

max-country <- hw-gapminder\(country [which(hw-gapminder\)lifeExp = max(hw-gapminder$lifeExp))]

Equal sign error – need the second ‘=’ to make ‘==’ as ‘equal to’
max_country <- hw_gapminder$country[which(hw_gapminder$lifeExp == max(hw_gapminder$lifeExp))]